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Search method in a hierarchical object structure 



The invention relates to a method of searching in a set of objects a 



predetermined number of objects that are closest to an example. The invention also relates to 
a computer program and equipment comprising means for implementing such a search 
method. The invention finally relates to a transmission system comprising such equipment. 



5 



The invention has interesting applications in the field of the use of audio/video 



data. 



The data transmission and storage capacities increase considerably, so that in a 



Jj great variety of fields including the field of consumer electronics, the user henceforth has 

*U difficulty managing the information he has at his disposal. In this context, the object search 

30 10 methods grow ever more important. 

nj United States patent 5,832, 1 82 describes data partitioning methods and 

q discusses the interest of such search methods. The data partitioning in effect permits to 

1=4 1 5 reduce the number of comparisons to be made for making a search, and thus the processing 
time necessary for the search. 

The invention notably has for its object to propose an efficient object search 
20 method of using an object partitioning at various levels. 

A search method according to the invention is characterized in that, for 
searching in a set of objects a predetermined number of objects which are closest to an 
example, by utilizing a multilevel partition which has a tree-like structure comprising nodes 
and leaves, the nodes containing elements representing classes of objects and the leaves 
25 containing objects, said method comprises the following steps: 

a step of passing through said tree-like structure starting from a node and 
going to the leaves by passing through the nodes whose representative elements are closest to 
the example, for selecting one or various leaves, 
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a step of testing whether the number of selected leaves is lower than said 
predetermined number of objects, 

and, if the number of selected leaves is lower than said predetermined number 
of objects, a new repetition of said steps starting from the brother node of the node passed 
5 through last, closest to said example. 

The use of a multilevel partition is particularly advantageous for making a 
search, because it permits to further reduce the number of comparisons necessary for the 
search and thus the processing time. It also permits to process sets comprising a much larger 
number of objects than with a single-level partition. Indeed, with a single-level partition, 
10 when the size of the set of objects significantly increases, this leads either to the increase of 
the number of classes, or to the increase of the number of objects contained in one class. In 
both cases one is led to compare the example searched for to a much larger number of 
objects. The processing time thus increases considerably. On the other hand, with a 
multilevel partition, the example searched for is only compared to a limited number of 
1 5 objects at each level of the partition. The increase of the size of the set thus has much less 
influence on the processing time of the search. 

The invention advantageously proposes to pass through the tree-like structure 
of a multilevel partition. 

In an advantageous embodiment of the invention the predetermined number of 
20 objects is a multiple of a predetermined number of results and said method comprises an 

additional selection step for retaining from the selected leaves only a number of leaves equal 
to said predetermined number of results, while the retained leaves are those that contain the 
objects that are closest to said example. 



25 comparisons to be made for making a search. But it necessarily causes a deterioration of the 
results of the search. This embodiment permits to limit this deterioration. Indeed, by first 
selecting a number of leaves higher than the desired number of results, and thereafter making 
a complementary selection, for example, by an exhaustive comparison of the objects 
contained in the selected leaves to the example searched for the quality of the results obtained 

30 is notably improved. 



that a measure of similarity is defined for this type of objects, that this measure of similarity 
is that which has been used for constructing the partition, and that it verifies the 3 following 
conditions: 



The partition of the objects results in a reduction of the number of 



In a general way the invention may be applied to any type of object provided 
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f is an application which associates a real number with two objects of the 

initial set, 

this real number is identical with whatever order in which the two objects are 

considered, 

5 the real number associated with two identical objects is higher than the real 

number associated with two different objects. 

The objects are formed, for example, by metadata, that is to say, structures 
which combine a set of data. Such metadata are, for example, descriptions of video shots, 
notably descriptions of the MPEG-7 type. The MPEG-7 draft indeed defines a certain number 

10 of descriptors for video shots (color descriptors, text descriptors, camera movement 

descriptors, ...), and proposes similar measures associated with these descriptors. For more 
details reference is made to the document ISO/IEC JTC1/SC29AVG1 1 N3521 (July 2000) 
entitled «Coding of moving pictures and associated audio information» which refers to the 
document «Visual Working Draft» version 4.0. 

1 5 These and other aspects of the invention are apparent from and will be 

elucidated, by way of non-limitative example, with reference to the embodiment(s) described 
hereinafter. 



20 In the drawings: 

Fig. 1 is a block diagram describing the operation of an example of a method 
of partitioning a set of objects, which provides a multilevel partition which may be used by a 
search method according to the invention, 

Fig. 2 is a diagram of an example of a tree-like structure used for 
25 implementing a search method according to the invention, 

Fig. 3 is a block diagram describing the operation of an example of a search 
method according to the invention, 

Fig. 4 is a diagram of an example of equipment according to the invention, and 

Fig. 5 is a diagram of an example of a transmission system according to the 

30 invention. 
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In Fig. 1 is shown a block diagram describing the operation of an example of a 
multilevel partitioning method intended to produce a multilevel partition of the type used by 
a search method according to the invention. 

The partitioning method shown in Fig. 1 comprises the following steps: 
5 (SSO): An initial partition PZ 0 is defined. This partition comprises a class C 0 ,o which contains 
all the objects of the set X. 

(551) : A partition PZj is created for each class Cj-i,k (k = 1, ... Qj-i) of the partition PZj.i 
which contains more than one object. This partition comprises Qj classes Cj,i, Cj,2, Cj,oj. 

(552) : A representative element R itU Rj^, Rj.oj is determined for each class Cj,i, C j>2 , 
1 0 Cj.Qj of the partition PZj. 

(553) : These representative elements are stored in a tree-like structure TR of such kind that 
each representative element Rj,i, Rj >2 , Rj.oj is a son of the representative element of the 
class Cj.^k- 

(554) : The steps (SSI), (SS2) and (SS3) are repeated until the partition PZj verifies a 
1 5 predetermined criterion. 

(555) : When the predetermined criterion is verified, the objects of the classes Cp, Cj,2, 
Cj ; Qj are stored so as to form the leaves of the nodes Rj,!, Rj,2, Rj.Qj, respectively. 

In step (SSI) one may use, for example, a partitioning method of the type 
«K-Means» as described in the article «An efficient K-means clustering algorithm» by 

20 K. Alsabti, S. Ranka and V. Singh, published on the occasion of «IPPS/SPDP Workshop on 
High Performance Data Mining, 1998, Orlando Florida». Also a hierarchical partitioning 
method via agglomeration may be used such as described in the introduction of cited United 
States patent, or also a combination of the two methods, a partial agglomeration method 
which is used for initializing a «K-means» method. 

25 The representative element of the class is, for example, the centroid of the 

class. For determining the centroid of a class, first a fictitious element which has the same 
similarity with all the elements of the class is calculated. The centroid is formed by the 
element of the class that is closest to this fictitious element. 

The multilevel partitioning method is terminated (that is to say, the 

30 predetermined condition is considered verified) either when the number of objects per class is 
closest possible to a maximum value, or when the objects contained in the classes of the 
partition PZj are sufficiently close to the centroid of the class. 

In Fig. 2 is shown an example of a tree-like structure TR obtained with such a 
multilevel partitioning method and which may be used for implementing a search method 
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according to the invention. The nodes of the tree are represented in dashed lines. They 
contain an element that represents a class of the set of objects. The leaves of the tree are 
represented in solid lines. They contain the objects xi, xn of the set X. 

Fig. 3 shows a block diagram describing the operation of an example of a 
5 search method according to the invention for selecting a predetermined number of objects N 
in a tree-like structure Y. According to Fig. 3 a search method according to the invention 
comprises the following steps: 

(TO): A variable NBO, which indicates the number of leaves that remain to be 
selected, is initialized. Its initial value is equal to the predetermined number of objects to be 
10 selected NBO = n. 

m _ (Tl): The number of leaves NBL(n), which depend on the current node n, is 

= determined. The leaves which depend on a node are the leaves of this node as well as the 
*0 leaves of the nodes that depend on this node. 

1= _ (T2): The number of leaves, which depend on the current node NBL(n), is 

^15 compared with the number of leaves that remain to be selected NBO. 

(T3): If they are the same (NBL(n) = NBO), the leaves depending on the 
JT current node n are selected (this selection operation is denoted S(n,x k ) in Fig. 3). And the 
iy method is terminated. 

Ft - (T4.0): If the number of leaves NBL(n) is lower than the number of leaves that 

20 remain to be selected (NBL(n) < NBO), the leaves depending on the current node n are 
selected (S(n,xt)). 

(T4. 1): The variable NBO, which indicates the number of leaves that remain to 
be selected, is updated in that the number of leaves NBL(n) are subtracted from the current 
number of leaves that remain to be selected: NBO = NBO-NBL(n). 
25 - (T4.2): The brother of the current node that is closest to the example, denoted 

NTEB(n), becomes the new current node: n = NTEB(n), and the step (Tl) is repeated. 

(T5): If the number of leaves NBL(n) is higher than the number of leaves that 
remain to be selected (NBL(n) > NBO), the son of the current node that is closest to the 
example, denoted NTEC(n), becomes the new current node: n = NTEC(n), and the step (Tl) 
30 is repeated. 

Advantageously, the number of objects to be selected NBO is set equal to a 
multiple of the number of results NBR desired by the user: NBO = a.NBR. In this case the 
search method according to the invention comprises an additional step (T6) for retaining 
from the selected a.NBR objects only the NBR objects that are closest to the example 
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searched for. For example, this additional selection, which is made in step (T6), consists of a 
systematic comparison of the oc.NBR objects contained in the leaves selected with the 

example searched for. 

The proximity of two objects is evaluated by using a measure of similarity f 
5 which depends on the type of objects concerned, which is the one that has been used for 
building the tree-like structure, and which satisfies the following three conditions: 

f is an application which associates a real number with two objects of the 

initial set, 

this real number is identical whatever the order in which the two objects are 

1 0 considered, 

the real number associated with two identical objects is higher than the real 
number associated with two different objects. 

The invention is notably applied to objects which are instances of descriptors 
defined in the draft of the MPEG-7 standard, by utilizing the measures of associated 
1 5 similarities which are proposed in this draft of the MPEG-7 standard. 

Fig. 4 shows an example of equipment according to the invention. This 
equipment is a camera 1 which comprises video capturing means 2 (for example of the CCD 
type). The camera 1 also comprises a memory 3 for storing data and a memory 4 for storing 
computer programs, a microprocessor assembly 5 for executing said programs, and a user 
20 interface 6 for receiving commands given by the user and for supplying data to the user. The 
memory 4 notably contains a set PG1 of one or various programs for coding the captured 
video. This set of programs PG1 notably delivers descriptions of MPEG-7 video shots which 
are stored in the memory 3. The memory 4 also contains: 

a multilevel partitioning method PG2 of a set formed by various of said 
25 MPEG-7 descriptions, 

a search program PG4 according to the invention for searching in a tree-like 
structure that contains said descriptions. 

In Fig. 5 is shown a diagram of an example of a transmission system according 
to the invention. Such a system comprises a data source 10, user equipment 20 and a medium 
30 30 for transporting signals between the data source 1 0 and the user equipment 20. The data 
source 10 is, for example, a video data source. The transmission medium, which transmits 
these video data to the user equipment, is formed, for example, by a cable network, a 
transmission network via satellite, a radio link .... The user equipment comprises a receiving 
circuit 100 notably used for receiving data transmitted by the source 10, a memory 1 10 for 
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storing data, notably received data, a memory 120 which contains computer programs, a 
microprocessor assembly 140 for executing said programs, and a user interface 160 for 
receiving commands given by the user and for supplying data to the user. The memory 120 
notably contains a program PG5 for putting together, based on received video data, a 
database of objects which are MPEG-7 descriptions relating to video shots. It also contains a 
program PG2 of multilevel partitioning of a set comprising objects of this database, and a 
program PG4 according to the invention for searching objects in a tree-like structure that 
contains said descriptions. 




